---
title: IO Managers | Dagster
description: IO Managers determine how to store solid outputs and load solid inputs.
---

# IO Managers

IO Managers are user-provided objects that store solid outputs and load them as inputs to downstream solids.

<div className="w-full h-48 bg-gray-50 flex items-center justify-center">
  TODO diagram: a pipeline, several solids, io managers between solids and point
  to storage.
</div>

## Relevant APIs

| Name                                                                        | Description                                          |
| --------------------------------------------------------------------------- | ---------------------------------------------------- |
| <PyObject module="dagster" object="io_manager" displayText="@io_manager" /> | The decorated function used to define an IO manager. |
| <PyObject module="dagster" object="IOManager" />                            | Base class for user-provided IO managers.            |

## Overview

Dagster solids have [inputs and outputs](/concepts/solids-pipelines/solids#inputs-and-outputs).
When a solid produces an output, what happens to it? When a solid needs an input, how is it loaded?

There are different ways you can use inputs and outputs in Dagster:

- **Solid outputs are connected to downstream inputs**: in this case, <PyObject module="dagster" object="IOManager" />s
  let you decide how solid outputs are stored and loaded in downstream solids.
- **Solid inputs are unconnected**: <PyObject module="dagster" object="DagsterTypeLoader" />s
  and <PyObject module="dagster" object="RootInputManager" />s (experimental) let you decide how
  inputs at the beginning of a pipeline are loaded. More information can be found
  in [Unconnected Inputs](/concepts/io-management/unconnected-inputs).

These APIs make it easy separate code that's responsible for logical data transformation from code
that's responsible for reading and writing the results. Solids can focus on business logic, while IO
managers handle I/O. This separation makes it easier to test the business logic and run it in
different environments.

This page will focus on cases where solid outputs are connected to downstream inputs.

### Outputs and downstream inputs

<PyObject module="dagster" object="IOManager" />s are user-provided objects that
are responsible for storing the output of a solid and loading it as input to
downstream solids. For example, an IO Manager might store and load objects from
files on a filesystem. Each solid output can have its own IOManager, or multiple
solid outputs can share an IOManager. The IOManager that's used for handling a
particular solid output is automatically used for loading it in downstream
solids.

I.e. an IOManager handles the teal boxes:

<p align="center">
  <img src="/images/concepts/io-managers.png" />
</p>

The default IOManager, <PyObject module="dagster" object="mem_io_manager" />, stores outputs in
memory, but this only works for the single process executor. Dagster provides out-of-the-box
IOManagers that pickle objects and save them. These are <PyObject module="dagster" object="fs_io_manager"/>
, <PyObject module="dagster_aws.s3" object="s3_pickle_io_manager"/>
, <PyObject module="dagster_azure.adls2" object="adls2_pickle_io_manager"/>
, or <PyObject module="dagster_gcp.gcs" object="gcs_pickle_io_manager"/>.

IOManagers are [resources](/concepts/resources/resource-definition), which means users can supply
different IOManagers for the same solid outputs in different situations. For example, you might use
an in-memory IOManager for unit-testing a pipeline and an S3IOManager in production.

---

## Defining an IO manager

Beside the out-of-the-box IO Managers, you can also define IO Managers on your own.

To define an IO manager, use the <PyObject module="dagster" object="io_manager" displayText="@io_manager" /> decorator.

```python
class MyIOManager(IOManager):
    def handle_output(self, context, obj):
        write_csv("some/path")

    def load_input(self, context):
        return read_csv("some/path")

@io_manager
def my_io_manager(init_context):
    return MyIOManager()
```

You can find more information about customizing an IO manager in
the [Example](#providing-a-custom-io-manager) section below.

## Using an IO manager

### Pipeline-wide IO manager

By default, all the inputs and outputs in a pipeline use the same IOManager. This IOManager is
determined by the <PyObject module="dagster" object="ResourceDefinition" /> provided for the
`"io_manager"` resource key. `"io_manager"` is a resource key that Dagster reserves specifically
for this purpose.

Here’s how to specify that all solid outputs are stored using the <PyObject module="dagster" object="fs_io_manager" />,
which pickles outputs and stores them on the local filesystem. It stores files in a directory with
the run ID in the path, so that outputs from prior runs will never be overwritten.

```python file=/concepts/io_management/default_io_manager.py
from dagster import ModeDefinition, fs_io_manager, pipeline, solid


@solid
def solid1(_):
    return 1


@solid
def solid2(_, a):
    return a + 1


@pipeline(mode_defs=[ModeDefinition(resource_defs={"io_manager": fs_io_manager})])
def my_pipeline():
    solid2(solid1())
```

```python
@pipeline(mode_defs=[ModeDefinition(resource_defs={"io_manager": fs_io_manager})])
def my_pipeline():
    solid2(solid1())
```

### Per-output IO manager

Not all the outputs in a pipeline should necessarily be stored the same way. Maybe some of the
outputs should live on the filesystem so they can be inspected, and others can be transiently stored
in memory.

To select the IOManager for a particular output, you can set an `io_manager_key` on
the <PyObject module="dagster" object="OutputDefinition" />, and then refer to that `io_manager_key`
when setting IO managers in your <PyObject module="dagster" object="ModeDefinition" />. In this
example, the output of solid1 will go to `fs_io_manager` and the output of solid2 will go to `mem_io_manager`.

```python file=/concepts/io_management/io_manager_per_output.py startafter=start_marker endbefore=end_marker
from dagster import ModeDefinition, OutputDefinition, fs_io_manager, mem_io_manager, pipeline, solid


@solid(output_defs=[OutputDefinition(io_manager_key="fs")])
def solid1(_):
    return 1


@solid(output_defs=[OutputDefinition(io_manager_key="mem")])
def solid2(_, a):
    return a + 1


@pipeline(mode_defs=[ModeDefinition(resource_defs={"fs": fs_io_manager, "mem": mem_io_manager})])
def my_pipeline():
    solid2(solid1())
```

## Examples

### Providing a custom IO manager

If you have specific requirements for where and how your outputs should be stored and retrieved, you
can define your own IOManager. For example, if your solids produce Pandas DataFrames that populate
tables in a data warehouse, you might write the following:

```python literalinclude file=/concepts/io_management/custom_io_manager.py startafter=start_marker endbefore=end_marker
from dagster import IOManager, ModeDefinition, io_manager, pipeline


class DataframeTableIOManager(IOManager):
    def handle_output(self, context, obj):
        # name is the name given to the OutputDefinition that we're storing for
        table_name = context.name
        write_dataframe_to_table(name=table_name, dataframe=obj)

    def load_input(self, context):
        # upstream_output.name is the name given to the OutputDefinition that we're loading for
        table_name = context.upstream_output.name
        return read_dataframe_from_table(name=table_name)


@io_manager
def df_table_io_manager(_):
    return DataframeTableIOManager()


@pipeline(mode_defs=[ModeDefinition(resource_defs={"io_manager": df_table_io_manager})])
def my_pipeline():
    solid2(solid1())
```

The <PyObject module="dagster" object="io_manager" /> decorator behaves nearly identically to
the <PyObject module="dagster" object="resource" /> decorator. It yields
an <PyObject module="dagster" object="IOManagerDefinition" />, which is a subclass
of <PyObject module="dagster" object="ResourceDefinition" /> that will produce
an <PyObject module="dagster" object="IOManager" />.

The provided `context` argument for `handle_output` is
an <PyObject module="dagster" object="OutputContext" />. The provided `context` argument for
`load_input` is an <PyObject module="dagster" object="InputContext" />. The linked API documentation
lists all the fields that are available on these objects.

### Providing per-output config to an IO manager

When launching a run, you might want to parameterize how particular outputs are stored.

For example, if your pipeline produces DataFrames to populate tables in a data warehouse, you might want to specify the table that each output goes to at run launch time.

To accomplish this, you can define an `output_config_schema` on the IO manager definition. The IOManager methods can access this config when storing or loading data, via the <PyObject module="dagster" object="OutputContext" />.

```python file=/concepts/io_management/output_config.py startafter=io_manager_start_marker endbefore=io_manager_end_marker
class MyIOManager(IOManager):
    def handle_output(self, context, obj):
        table_name = context.config["table"]
        write_dataframe_to_table(name=table_name, dataframe=obj)

    def load_input(self, context):
        table_name = context.upstream_output.config["table"]
        return read_dataframe_from_table(name=table_name)


@io_manager(output_config_schema={"table": str})
def my_io_manager(_):
    return MyIOManager()
```

Then, when executing a pipeline, you can pass in this per-output config.

```python file=/concepts/io_management/output_config.py startafter=execute_start_marker endbefore=execute_end_marker
@pipeline(mode_defs=[ModeDefinition(resource_defs={"io_manager": my_io_manager})])
    def my_pipeline():
        solid2(solid1())

    execute_pipeline(
        my_pipeline,
        run_config={
            "solids": {
                "solid1": {"outputs": {"result": {"table": "table1"}}},
                "solid2": {"outputs": {"result": {"table": "table2"}}},
            }
        },
    )
```

### Providing per-output metadata to an IO manager (experimental)

You might want to provide static metadata that controls how particular outputs are stored. You don't plan to change the metadata at runtime, so it makes more sense to attach it to a definition rather than expose it as a configuration option.

For example, if your pipeline produces DataFrames to populate tables in a data warehouse, you might want to specify that each output always goes to a particular table. To accomplish this, you can define `metadata` on each <PyObject module="dagster" object="OutputDefinition" />:

```python file=/concepts/io_management/metadata.py startafter=solids_start_marker endbefore=solids_end_marker
@solid(output_defs=[OutputDefinition(metadata={"schema": "some_schema", "table": "some_table"})])
def solid1(_):
    """Return a Pandas DataFrame"""


@solid(output_defs=[OutputDefinition(metadata={"schema": "other_schema", "table": "other_table"})])
def solid2(_, _input_dataframe):
    """Return a Pandas DataFrame"""
```

```python
@solid(output_defs=[OutputDefinition(metadata={"schema": "some_schema", "table": "some_table"})])
def solid1(_):
    """Return a Pandas DataFrame"""


@solid(output_defs=[OutputDefinition(metadata={"schema": "other_schema", "table": "other_table"})])
def solid2(_, _input_dataframe):
    """Return a Pandas DataFrame"""
```

The IOManager can then access this metadata when storing or retrieving data, via the <PyObject module="dagster" object="OutputContext" />.

In this case, the table names are encoded in the pipeline definition. If, instead, you want to be able to set them at run time, the next section describes how.

```python file=/concepts/io_management/metadata.py startafter=io_manager_start_marker endbefore=io_manager_end_marker
class MyIOManager(IOManager):
    def handle_output(self, context, obj):
        table_name = context.metadata["table"]
        schema = context.metadata["schema"]
        write_dataframe_to_table(name=table_name, schema=schema, dataframe=obj)

    def load_input(self, context):
        table_name = context.upstream_output.metadata["table"]
        schema = context.upstream_output.metadata["schema"]
        return read_dataframe_from_table(name=table_name, schema=schema)


@io_manager
def my_io_manager(_):
    return MyIOManager()
```

## Testing an IO Manager

<div className="w-full h-48 bg-gray-200 flex items-center justify-center">
  TODO
</div>
